High-speed precoders for communication systems

ABSTRACT

The invention relates to techniques for implementing high-speed precoders, such as Tomlinson-Harashima (TH) precoders. In one aspect of the invention, look-ahead techniques are utilized to pipeline a TH precoder, resulting in a high-speed TH precoder. These techniques may be applied to pipeline various types of TH precoders, such as Finite Impulse Response (FIR) precoders and Infinite Impulse Response (IIR) precoders. In another aspect of the invention, parallel processing multiple non-pipelined TH precoders results in a high-speed parallel TH precoder design. Utilization of high-speed TH precoders may enable network providers to for example, operate 10 Gigabit Ethernet with copper cable rather than fiber optic cable.

This application claims the benefit of U.S. Provisional Application No. 60/609,289 to Parhi et al., entitled “PIPELINED AND PARALLEL TOMLINSON-HARASHIMA PRECODERS,” filed Sep. 13, 2004, and U.S. Provisional Application No. 60/715,672, to Parhi et al., entitled “PIPELINED AND PARALLEL TOMLINSON-HARASHIMA PRECODERS,” filed Sep. 9, 2005, the entire contents of each being incorporated herein by reference.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH FOR DEVELOPMENT

The invention was made with Government support from the National Science Foundation No.CCF-0429979. The Government may have certain rights in this invention.

TECHNICAL FIELD

The invention relates to computer networks and, more specifically, to precoding data for transmission across computer networks.

BACKGROUND

Tomlinson-Harashima Precoding (THP) was invented by Tomlinson and Harashima for channel equalization, and has been widely used in DSL systems, voice band and cable modems. Unlike decision feedback equalization where channel equalization takes place at the receive side, THP is a transmitter technique where equalization is performed at the transmitter side. It may eliminate error propagation and allow the use of current capacity-achieving channel codes, such as low-density parity-check (LDPC) codes in a natural way.

THP converts an inter-symbol interference (ISI) channel to a near additive white gaussian noise (AWGN) channel and allows the system to take full advantage of current capacity-achieving error correction codes. Like decision feedback equalizers, Tomlinson-Harashima (TH) precoders contain nonlinear feedback loops, which limit their use for high speed applications. The speed of TH precoders is limited by the sum of the computation times of two additions and one multiplication. Unlike decision feedback equalization where the output levels of the nonlinear devices (quantizers) are finite, in TH precoders the output levels of the modulo devices are infinite, or finite but very large. Thus, it is difficult to apply look-ahead and pre-computation techniques to pipeline TH precoders, which were successfully applied to pipeline Decision Feedback Equalizers (DFEs) in the past.

Recently, TH precoding has been proposed to be used in 10 Gigabit Ethernet over copper. The symbol rate of 10GBASE-T is expected to be around 800 Mega Baud. However, TH precoders contain feedback loops, so it is hard to clock them at such high speed. Thus, the high speed design of TH precoders is of great interest.

SUMMARY

In general, the invention relates to techniques for implementing high-speed precoders, such as Tomlinson-Harashima (TH) precoders. In one aspect of the invention, look-ahead techniques are utilized to pipeline a TH precoder, resulting in a high-speed TH precoder. These techniques may be applied to pipeline various types of TH precoders, such as Finite Impulse Response (FIR) precoders and Infinite Impulse Response (IIR) precoders. In another aspect of the invention, parallel processing multiple non-pipelined TH precoders results in a high-speed parallel TH precoder design. Utilization of high-speed TH precoders may enable network providers to for example, operate 10 Gigabit Ethernet with copper cable rather than fiber optic cable.

In one embodiment, a precoder comprises a plurality of pipelined computation units to produce a precoded symbol, wherein one of the computation units performs a modulo operation and feeds back a compensation signal for use as an input to the modulo operation for precoding a subsequent symbol.

In another embodiment, a method comprises performing a modulo operation within one of a plurality of pipelined computational units of a pipelined precoder to produce a precoded symbol, wherein the modulo operation feeds back a compensation signal for use as an input to the modulo operation for precoding a subsequent symbol, and sending a network communication in accordance with the precoded symbol.

In another embodiment, a parallel precoder comprises a plurality of computation units to output signals for at least two precoded symbols in parallel, wherein the computation units includes a first modulo operation unit and a second first modulo operation unit, and wherein the first one of the modulo operation units performs a modulo operation for precoding a first one of the symbols and forwards a compensation signal to the second modulo operation unit for use as an input to a modulo operation for precoding a second one of the symbols.

In another embodiment, a method comprises performing, in parallel, at least two modulo operations for outputting at least two precoded symbols, wherein at least one of the modulo operations produces a compensation signal for use in precoding a subsequent symbol, and outputting a network communication in accordance with the precoded symbols.

In another embodiment, a transceiver for network communications, wherein the transceiver comprises a parallel Tomlinson-Harashima precoder that performs at least two modulo operations in parallel for outputting at least two precoded symbols.

The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating an exemplary network communication system.

FIGS. 2A-2F are block diagrams illustrating an exemplary process of pipelining a finite impulse response (FIR) TH precoder.

FIG. 3 is a block diagram of a first exemplary pipelined precoder.

FIG. 4 is a block diagram of a second exemplary pipelined precoder.

FIG. 5 is a block diagram of an exemplary modified precoder pipelined architecture.

FIGS. 6A-6D are block diagrams illustrating an exemplary process of pipelining a infinite impulse response (IIR) TH precoder.

FIG. 7A is a block diagram of an exemplary 3-level parallel IRR filter.

FIG. 7B is a block diagram of an exemplary 3-level parallel TH precoder.

FIG. 7C is a block diagram of an exemplary equivalent form 3-level parallel TH precoder.

FIGS. 8A and 8B are block diagrams of exemplary parallel TH precoder designs.

FIG. 9 is a block diagram of an exemplary architecture of a parallel IIR filter.

FIGS. 10 and 11 are block diagrams of exemplary TH precoder architectures corresponding with the parallel IRR filter of FIG. 9.

FIGS. 12 and 13 are block diagrams of exemplary receiver architectures with parallel precoders.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of an exemplary network communication system 2. For purposes of the present description, communication system 2 will be assumed to be a 10 Gigabit Ethernet over copper network. Although the system will be described with respect to 10 Gigabit Ethernet over copper, it shall be understood that the present invention is not limited in this respect, and that the techniques described herein are not dependent upon the properties of the network. For example, communication system 2 could also be implemented within networks of various configurations utilizing one of many protocols without departing from the scope of the present invention.

In the example of FIG. 1, communication system 2 includes transmitter 6 and receiver 14. Transmitter 6 comprises encoder 10 and precoder 8, which encode and precode outbound data, 4 respectively, for transmission via network connection 12. Outbound data 4 may take the form of a stream of symbols for transmission to receiver 4. Once receiver 14 receives the encoded data, decoder 16 decodes the data resulting in decoded data 18, which may represent a stream of estimated symbols. In some cases decoded data 18 may then be utilized by applications within a network device that includes receiver 14.

In one embodiment, transmitter 6, located within a first network device (not shown), may transmit data to receiver 14, which may be located within a second network device (not shown). The first network device may also include a receiver substantially similar to receiver 14. The second network device may also include a transmitter substantially similar to transmitter 6. In this way, the first and second network devices may achieve two way communication with each other or other network devices. Examples of network devices that may incorporate transmitter 6 or receiver 14 include desktop computers, laptop computers, network enabled personal digital assistants (PDAs), digital televisions, or network appliances generally.

Precoder 8 may be a high-speed precoder, such as a pipelined Tomlinson-Harashima (TH) precoder or a parallel TH precoder. Utilization of high-speed TH precoders may enable network providers to operate 10 Gigabit Ethernet with copper cable. For example, network providers may operate existing copper cable networks at higher speeds without having to incur the expense of converting copper cables to more expensive media, such as fiber optic cables. Furthermore, in certain embodiments of the invention, the high-speed TH precoder design may reduce hardware overhead of the precoder. Although the invention will be described with respect to TH precoders, it shall be understood that the present invention is not limited in this respect, and that the techniques described herein may apply to other types of precoders.

FIGS. 2A-2F are block diagrams illustrating an exemplary process of pipelining a finite impulse response (FIR) TH precoder. FIG. 2A is a block diagram of FIR TH precoder 20. FIR TH precoder 20 comprises a nonlinear modulo device 22 in the feedforward path of the pre-equalizer to limit the output dynamic range.

FIG. 2B is a block diagram of an equivalent form precoder 26, which is an equivalent form of FIR TH precoder 20 expressed as an infinite impulse response (IRR) filter. Modulo device 22 (FIG. 2A) is removed from equivalent form precoder 26 illustrated in FIG. 2B. Furthermore, a unique compensation signal v(k) 30 of multiple 2M is added to the transmitted pulse amplitude modulated (PAM) signal x(k) 28 such that the output of the precoder t(k) 32 is limited in the interval [−M,M]. Thus, an effective transmitted data sequence in the z-domain is:

$\begin{matrix} {{t(z)} = {\frac{{x(z)} + {v(z)}}{H(z)}.}} & (1) \end{matrix}$

As described herein, the next step of the pipelining process is to replace IIR filter

$\frac{1}{H(z)}34$ (FIG. 2B) with a pipelined version. There are many ways to pipeline the IIR filter

$\frac{1}{H(z)}.$ For example, either a clustered look-ahead approach or a scattered look-ahead approach may be utilized. In both of the approaches, the pipelined filter H_(p)(z) is obtained by multiplying an appropriate polynomial

${N(z)} = {1 + {\sum\limits_{i = 1}^{L_{N}}{n_{i}z^{- i}}}}$ to both the numerator and the denominator of the transfer function of the original IIR filter:

$\begin{matrix} {{H_{p}(z)} = {\frac{N(z)}{{H(z)}{N(z)}} = \frac{N(z)}{D(z)}}} & (2) \end{matrix}$ The pipelined filter H_(p)(z) consists of two parts, a FIR filter N(z) and an all-pole pipelined IIR filter

$\frac{1}{D(z)}.$ In the case of the clustered look-ahead approach, D(z) may be expressed in the form of

$\begin{matrix} {{D(z)} = {1 + {z^{- K}{\sum\limits_{i = 0}^{K + L_{H}}{d_{i}z^{- i}}}}}} & (3) \end{matrix}$ and, for the scattered look-ahead approach D(z) may be represented as

$\begin{matrix} {{D(z)} = {1 + {\sum\limits_{i = 1}^{L_{H}}{d_{i}z^{- {iK}}}}}} & (4) \end{matrix}$ where K is the pipelining level.

FIG. 2C is a block diagram of an exemplary pipelined equivalent form 40. IIR filter

$\frac{1}{H(z)}34$ (FIG. 2B) is replaced with pipelined version {Beth−1/H_(p)(z)} as described in equation (2), resulting in a design illustrated in FIG. 2C.

FIG. 2D is a block diagram of an exemplary FIR TH precoder architecture 50. The design of FIR TH precoder architecture 50 was generated from pipelined equivalent form 40 (FIG. 2C) with nonlinear modulo device 22 (first introduced in FIG. 2A) returned to the feedforward path, which removes input v(k) 30 from all-pole IIR filter

$\frac{1}{D(z)}52.$

FIG. 2E is a block diagram of another exemplary FIR TH precoder architecture 60, distinguished from architecture 50 of FIG. 2D by sending a delayed version of compensation signal v(k) 30 to the input of FIR filter N_(e)(z) 62. Signal v(k) 30 is delayed by delay element 64 illustrated in FIG. 2E. Delay element 64 may be any type of electronic delaying device.

In order to implement the pipelined design process illustrated in FIGS. 2A-2F, the modifications need to maintain the functionality of a straightforward TH precoder. The term “straightforward” refers to a non-pipelined TH precoder and will be used throughout this detailed description. The design of FIR TH precoder architecture 60 maintains the functionality of a straightforward (non-pipelined) TH precoder as is described by the following equations 5-12. First, N_(e)(z) 62 (FIG. 2E) is defined as

$\begin{matrix} {{N_{e}(z)} = {{\sum\limits_{i = 1}^{L_{N}}{n_{i}z^{{- i} + 1}}} = {{z\left( {{N(z)} - 1} \right)}.}}} & (5) \end{matrix}$ Next, signal e(z) 66 (FIG. 2E) is defined as

$\begin{matrix} \begin{matrix} {{e(z)} = {{{N(z)}{x(z)}} + {{N_{e}(z)}\left( {z^{- 1}{v(z)}} \right)} + {\left( {1 - {D(z)}} \right){t(z)}}}} \\ {= {{{N(z)}{x(z)}} + {\left( {{N(z)} - 1} \right){v(z)}} + {\left( {1 - {{H(z)}{N(z)}}} \right){{t(z)}.}}}} \end{matrix} & (6) \end{matrix}$ The modulo operation may be described by t(z)=e(z)+v(z)  (7) thus, e(z)=t(z)−v(z)  (8) Substitute equation (8) into (6), results in t(z)−v(z)=N(z)x(z)+(N(z)−1)v(z)+(1+H(z)N(z))t(z).  (9)

From equation (9), t(z) is expressed as

$\begin{matrix} {{t(z)} = {\frac{{N(z)}\left( {{x(z)} + {v(z)}} \right.}{{H(z)}{N(z)}} = {\frac{{x(z)} + {v(z)}}{H(z)}.}}} & (10) \end{matrix}$ At the receiver side, the received signal r(z) is given by

$\begin{matrix} {{r(z)} = {{{H(z)}{t(z)}} = {{{H(z)}\frac{{x(z)} + {v(z)}}{H(z)}} = {{x(z)} + {{v(z)}.}}}}} & (11) \end{matrix}$ As in a straightforward application of a TH precoder, the received signal consists of the PAM-M signal x(z) and the compensation signal v(z) of multiple of 2M. Thus, the design in FIG. 2E still keeps the same functionality as a TH precoder. In the presence of additive noise, the received signal becomes

$\begin{matrix} {{r(z)} = {{{{H(z)}{t(z)}} + {n(z)}} = {{{{H(z)}\frac{{x(z)} + {v(z)}}{H(z)}} + {n(z)}} = {{x(z)} + {v(z)} + {{n(z)}.}}}}} & (12) \end{matrix}$ Thus, the error probability performance is the same as a straightforward implementation.

As illustrated in FIG. 2E, there are mainly two nonlinear feedback loops in the design. One is the pipelined loop containing the FIR filter 1−D(z). The other is the non-pipelined nonlinear loop containing the FIR filter Ne(z). The speed of the design in FIG. 2E is limited by the non-pipelined loop. However, like feedback loops in decision feedback equalizers (DFEs), the compensation signal v(k) 30 in the non-pipelined loop only takes finite number of different values. Thus, all possible outputs of the FIR filter N_(e)(z) 62 may be pre-computed as in a pre-computation technique for quantizers that is known in the art.

FIG. 2F is a block diagram of an exemplary precoder pipelined architecture 70. N_(e)(z) 62 (FIG. 2E) was assumed to have only two taps and was therefore replaced by all pre-computed values 72 in TH precoder pipelined architecture 70.

In a first example of a pipelined precoder, a channel transfer function is H(z)=1+h₁z⁻¹+h₂z⁻². The transfer function H_(e)(z) of a zero-forcing pre-equalizer is

$\begin{matrix} {{{H_{e}(z)}\frac{1}{H(z)}} = {\frac{1}{1 + {h_{1}z^{- 1}} + {h_{2}z^{- 2}}}.}} & (13) \end{matrix}$ A 2-level scattered look-ahead pipelined design of the IIR filter He(z) may be obtained by multiplying N(z)=1−h₁z⁻¹+h₂z⁻² to the numerator and the denominator of H_(e)(z)

$\begin{matrix} {{H_{p}(z)} = \frac{1 - {h_{1}z^{- 1}} + {h_{2}z^{- 2}}}{1 + {\left( {{2h_{2}} - h_{1}^{2}} \right)z^{- 2}} + {h_{2}^{2}z^{- 4}}}} & (14) \end{matrix}$ Apply the technique in FIG. 2 to this first example to obtain an exemplary pipelined precoder design.

FIG. 3 is a block diagram of a first exemplary pipelined precoder 80. The techniques of FIG. 2 were applied to the first example described above resulting in pipelined precoder 80. The iteration bound T_(∞) of pipelined precoder 80 is given by

$\begin{matrix} {T_{\infty} = {\max\left\{ {\frac{{3T_{a}} + T_{mod} + T_{m}}{2},{T_{a} + T_{mod} + T_{mux}}} \right\}}} & (15) \end{matrix}$ where T_(mux) is the operation time of a multiplexer. Although described with respect to a multiplexer, other types of selection elements may be used. Assume T_(m) dominates the computation time, then pipelined precoder 80 may achieve a speedup of 2. In general, for a K-level scattered pipelined design, the iteration bound is given by

$\begin{matrix} {T_{\infty} = {\max{\left\{ {\frac{{3T_{a}} + T_{mod} + T_{m}}{K},{T_{a} + T_{mod} + T_{mux}}} \right\}.}}} & (16) \end{matrix}$ For a large-enough pipelining level K, the speed is limited by the term T_(a)+T_(mod)+T_(mux).

In a second example of a pipelined precoder, the iteration bound of pipelined precoder 80 is improved by reformulating the design of pipelined precoder 80.

FIG. 4 is a block diagram of second exemplary pipelined precoder 90. The iteration bound of pipelined precoder 90 is given by

$\begin{matrix} {T_{\infty} = {\max\left\{ {\frac{{3T_{a}} + T_{mod} + T_{m} + {L_{N}T_{mux}}}{K},{T_{mod} + T_{mux}}} \right\}}} & (17) \end{matrix}$ For a large K, the speed of pipelined precoder 90 is limited by the term T_(mod)+T_(mux).

For a clustered K-level pipelined design, the iteration bounds in equations (16) and (17) become

$\begin{matrix} {{{T_{\infty} =_{{i = 0},1,{{\ldots\mspace{11mu} L_{H}} - 2}}^{\max}},\left\{ {\frac{{\left( {3 + i} \right)T_{a}} + T_{mod} + T_{m}}{K + 1},{T_{a} + T_{mod} + T_{mux}}} \right\}}{and}} & (18) \\ {{T_{\infty} =_{{i = 0},1,{{\ldots\mspace{11mu} L_{H}} - 2}}^{\max}},\left\{ {\frac{{\left( {3 + i} \right)T_{a}} + T_{mod} + T_{m} + {L_{N}T_{mux}}}{K + 1},{T_{mod} + T_{mux}}} \right\}} & (19) \end{matrix}$ respectively.

One drawback associated with precoder pipelined architecture 70 (FIG. 2F) is the hardware overhead. The overhead due to pre-computation is exponential with the number of taps of the FIR filter N_(e)(z). When the number of taps is large, the hardware overhead is formidable. To reduce the overhead, N_(e)(z) may be rewritten into two parts

$\begin{matrix} {{{N_{e}(z)} = {{N_{e1}(z)} + {z^{- {({L_{Ne1} - 1})}}{N_{e2}(z)}\mspace{14mu}{where}}}}{{N_{e1}(z)} = {{\sum\limits_{i = 1}^{L_{Ne1}}{n_{i}z^{- {({i - 1})}}\mspace{14mu}{and}\mspace{14mu}{N_{e2}(z)}}} = {\sum\limits_{i = {L_{Ne1}1}}^{L_{N}}{n_{i}{z^{- {({i - L_{Ne1}})}}.}}}}}} & (20) \end{matrix}$ Then, the precoder pipelined architecture 70 (FIG. 2F) may be redrawn.

FIG. 5 is a block diagram of an exemplary modified precoder pipelined architecture 100, which utilizes a reduction in hardware in comparison to precoder pipelined architecture 70 (FIG. 2F). For a high speed design, only the output of the FIR filter N_(e1)(z) needs to be pre-computed.

Often, it is more compact to describe a channel with an IIR model

$\begin{matrix} {{{H(z)} = {\frac{B(z)}{A(z)}\mspace{14mu}{where}}}{{A(z)} = {{1 + {\sum\limits_{i = 1}^{L_{A}}{a_{i}z^{- i}\mspace{14mu}{and}\mspace{14mu}{B(z)}}}} = {1 + {\sum\limits_{i = 1}^{L_{B}}{b_{i}{z^{- i}.}}}}}}} & (21) \end{matrix}$ Utilization of the IIR model may be another method of reducing the hardware complexity. However, the corresponding IIR precoder also suffers from a timing problem. The iteration bound of the precoder is T _(∞)=2T _(a) +T _(m).  (22) Thus, it is desirable to develop pipelined designs to reduce the iteration bound of IIR precoders.

FIGS. 6A-6D are block diagrams illustrating an exemplary process of pipelining a TH IRR precoder. FIG. 6A is a block diagram of an IIR TH precoder 120 with

${H(z)} = {\frac{B(z)}{A(z)}.}$

FIG. 6B is a block diagram of an equivalent form precoder 130, which is an equivalent form of IRR TH precoder 120.

FIG. 6C is a block diagram of an alternative equivalent form precoder 140, which was generated by redrawing equivalent form precoder 130. The speed of the design is limited by the speed of the IIR filter

$\frac{1}{B(z)}.$ As described above, pipelining techniques, such as the clustered and the scattered look-ahead approaches are applied to remove this bound, resulting in a pipelined equivalent form.

FIG. 6D is a block diagram of a pipelined equivalent form 150, where

${N(z)} = {\sum\limits_{i = 1}^{L_{N}}{n_{i}z^{- i}}}$ is a pipelining polynomial. Then, the same techniques described in FIGS. 2C through 2F may be applied to pipelined equivalent form 150 (FIG. 6D) to pipeline the IIR TH precoder. Additionally, the techniques described in FIG. 5 may be applied to reduce the complexity of the fully pre-computed design. The resulting iteration bounds are the same as those described in FIGS. 3 and 4.

As an alternative to pipelining precoders, a parallel precoder design may be implemented to generate a high-speed precoder. The parallel precoder design may be one of two designs, either a (1) parallel precoder where the parallelism level L is less or equal to the order of the channel, or a (2) parallel precoder where the parallelism level is larger than the order of the channel.

For the parallel precoder design where the parallelism level L is less or equal to the order of the channel, an inter-symbol interference (ISI) channel is H(z)=1+h ₁ z ⁻¹ +h ₂ z ² +h ₃ z ⁻³ +h ₄ z ⁻⁴  (23) and its corresponding pre-equalizer t(n)=−h ₁ t(n−1)−h ₂ t(n−2)−h ₃ t(n−3)−h ₄ t(n−4)+x(n).  (24) From equation (24), the 2-stage and 3-stage look ahead equations are derived as

$\begin{matrix} \begin{matrix} {{t(n)} = {{{- h_{1}}{t\left( {n - 1} \right)}} - {h_{2}{t\left( {n - 2} \right)}} - {h_{3}{t\left( {n - 3} \right)}} - {h_{4}{t\left( {n - 4} \right)}} + {x(n)}}} \\ {= {{\left( {h_{1}^{2} - h_{2}} \right){t\left( {n - 2} \right)}} + {\left( {{h_{1}h_{2}} - h_{3}} \right){t\left( {n - 3} \right)}} + {\left( {{h_{1}h_{3}} - h_{4}} \right){t\left( {n - 4} \right)}}}} \end{matrix} & (25) \\ {{{+ h_{1}}h_{4}{t\left( {n - 5} \right)}} - {h_{1}{x\left( {n - 1} \right)}} + {x(n)}} & (26) \\ \begin{matrix} {\;{= {{\left( {{- h_{1}^{3}} + {2h_{1}h_{2}} - h_{3}} \right){t\left( {n - 3} \right)}} + {\left( {{{- h_{1}^{2}}h_{2}} + {h_{1}h_{3}} + h_{2}^{2} - h_{4}} \right){t\left( {n - 4} \right)}} +}}} \\ {{\left( {{h_{1}^{2}h_{3}} + {h_{1}h_{4}} + {h_{2}h_{3}}} \right){t\left( {n - 5} \right)}} + {\left( {{{- h_{1}^{2}}h_{4}} + {h_{2}h_{4}}} \right){t\left( {n - 6} \right)}} +} \\ {{\left( {h_{1}^{2} - h_{2}} \right){x\left( {n - 2} \right)}} - {h_{1}{x\left( {n - 1} \right)}} + {{x(n)}.}} \end{matrix} & (27) \end{matrix}$ Substitute n=3k+3, n=3k+4, and n=3k+5 into equations (25), (26), and (27), respectively, and the following three loop update equations are obtained

$\begin{matrix} {{{t\left( {{3k} + 3} \right)} = {{{- h_{1}}{t\left( {{3k} + 2} \right)}} - {h_{2}{t\left( {{3k} + 1} \right)}} - {h_{3}{t\left( {3k} \right)}} - {h_{4}{t\left( {{3k} - 1} \right)}} + {x\left( {{3k} + 3} \right)}}}{{t\left( {{3k} + 4} \right)} = {{\left( {h_{1}^{2} - h_{2}} \right){t\left( {{3k} + 2} \right)}} + {\left( {{h_{1}h_{2}} - h_{3}} \right){t\left( {{3k} + 1} \right)}} + {\left( {{h_{1}h_{3}} - h_{4}} \right){t\left( {3k} \right)}} + {h_{1}h_{4}{t\left( {{3k} - 1} \right)}}}}} & (28) \\ {{{- h_{1}}{x\left( {{3k} + 3} \right)}} + {x\left( {{3k} + 4} \right)}} & (29) \\ {{t\left( {{3k} + 5} \right)} = {{\left( {{- h_{1}^{3}} + {2h_{1}h_{2}} - h_{3}} \right){t\left( {{3k} + 2} \right)}} + {\left( {{{- h_{1}^{2}}h_{2}} + {h_{1}h_{3}} + h_{2}^{2} - h_{4}} \right){t\left( {{3k} + 1} \right)}} + {\left( {{{- h_{1}^{2}}h_{3}} + {h_{1}h_{4}} + {h_{2}h_{3}}} \right){t\left( {3k} \right)}} + {\left( {{{- h_{1}^{2}}h_{4}} + {h_{2}h_{4}}} \right){t\left( {{3k} - 1} \right)}} + {\left( {h_{1}^{2} - h_{2}} \right){x\left( {{3k} + 3} \right)}} - {h_{1}{x\left( {{3k} + 4} \right)}} + {{x\left( {{3k} + 5} \right)}.}}} & (30) \end{matrix}$

FIG. 7A is a block diagram of an exemplary 3-level parallel IIR filter 160, as described by equations (23)-(30).

FIG. 7B is a block diagram of an exemplary 3-level parallel TH precoder 170, which was generated by introducing modulo devices 172 into the output paths of 3-level parallel IIR filter 160 (FIG. 7A). Modulo devices 172 limit the dynamic range of the outputs of parallel IIR filter 160 (FIG. 7A).

FIG. 7C is a block diagram of an exemplary equivalent form parallel TH precoder 180, which is equivalent to the 3-level parallel TH precoder 170 (FIG. 7B). The outputs of equivalent form parallel TH precoder 180, as illustrated in FIG. 7C, are

$\begin{matrix} {{t\left( {{3k} + 3} \right)} = {{{- h_{1}}{t\left( {{3k} + 2} \right)}} - {h_{2}{t\left( {{3k} + 1} \right)}} - {h_{3}{t\left( {3k} \right)}} - {h_{4}{t\left( {{3k} - 1} \right)}} + {x\left( {{3k} + 3} \right)} + {v\left( {{3k} + 3} \right)}}} & (31) \\ {{t\left( {{3k} + 4} \right)} = {{\left( {h_{1}^{2} - h_{2}} \right){t\left( {{3k} + 2} \right)}} + {\left( {{h_{1}h_{2}} - h_{3}} \right){t\left( {{3k} + 1} \right)}} + {\left( {{h_{1}h_{3}} - h_{4}} \right){t\left( {3k} \right)}} + {h_{1}h_{4}{t\left( {{3k} - 1} \right)}} - {h_{1}{x\left( {{3k} + 3} \right)}} + {x\left( {{3k} + 4} \right)} + {v\left( {{3k} + 4} \right)}}} & (32) \\ {{t\left( {{3k} + 5} \right)} = {{\left( {{- h_{1}^{3}} + {h_{1}h_{2}} - h_{3}} \right){t\left( {{3k} + 2} \right)}} + {\left( {{{- h_{1}^{2}}h_{2}} - {h_{1}h_{3}} + h_{2}^{2} - h_{4}} \right){t\left( {{3k} + 1} \right)}} + {\left( {{{- h_{1}^{2}}h_{3}} + {h_{1}h_{4}} + {h_{2}h_{4}}} \right){t\left( {3k} \right)}} + {\left( {{{- h_{1}^{2}}h_{4}} + {h_{2}h_{4}}} \right){{t\left( {{3k} - 1} \right)}.}}}} & (33) \end{matrix}$ From equations (24), (31), (32), and (33), the received signals at time n=3k, n=3k+1, n 3k+2, are r(3k)=x(3k)+v(3k)  (34) r(3k+1)=x(3k+1)+v(3k+1)+h ₁ v(3k)  (35) r(3k+2)=x(3k+1)+v(3k+2)+h ₁ v(3k+1)+h ₂ v(3k).  (36) In the presence of additive noise, the received signals become r(3k)=x(3k+3)+v(3k+3)+n(3k+3)  (37) r(3k+1)=x(3k+1)+v(3k+1)+h ₁ v(3k)+n(3k+1)  (38) r(3k+2)=x(3k+2)+v(3k+2)+h ₁ v(3k+1)+h ₂ v(3k)+n(3k+2)  (39)

FIGS. 8A and 8B are block diagrams of exemplary parallel TH precoder designs 190 and 200. Both precoder designs 190 and 200 recover x(3k+i), i=0, 1, 2 from r(3k+i), i=0, 1, 2. The architecture in precoder design 200 (FIG. 8B) is more suitable for applications where x(3k+i), i=0, 1, 2 are coded.

The error probability of

${\hat{x}\left( {{3k} + 3} \right)}\mspace{14mu}{is}\mspace{14mu}{f\left( \frac{S_{x}}{S_{n}} \right)}$ where f is the error probability function for PAM-M modulation. For {circumflex over (x)}(3k+4), there are two causes for a decision error. One is due to the noise n(3k+4), and the corresponding error rate is

${f\left( \frac{S_{x}}{S_{n}} \right)}.$ The other is due to the decision error on {circumflex over (v)}(3k+3). Since the minimum distance between different levels of the compensation signal is M times that between the transmitted symbols x, the error rate of v(3k+3) may be roughly calculated as

${f\left( \frac{M^{2}S_{x}}{S_{n}} \right)}.$ Thus, the error due to n(3k+4) dominates. Furthermore, the error rate of {circumflex over (x)}(3k+4) may be approximated by

${f\left( \frac{S_{x}}{S_{n}} \right)}.$ Similarly, for a large enough M, the error rate of {circumflex over (x)}(3k+5) may be approximated by

${f\left( \frac{S_{x}}{S_{n}} \right)}.$ Hence, the performance of the parallel precoder is close to that of a straightforward TH precoder.

For the parallel precoder design where the parallelism level L is larger than the order of the channel, a second order ISI channel is H(z)=1+h ₁ z ⁻¹ +h ₂ z ⁻²  (40) and its corresponding zero-forcing pre-equalizer is t(n)=−h ₁ t(n−1)−h ₂ t(n2)+x(n).  (41) Its 2-stage, 3-stage, and 4-stage look-ahead equations may be derived as

$\begin{matrix} {{t(n)} = {{{- h_{1}}{t\left( {n - 1} \right)}} - {h_{2}{t\left( {n - 2} \right)}} + {x(n)}}} & (42) \\ \begin{matrix} {\mspace{40mu}{= {{\left( {h_{1}^{2} - h_{2}} \right){t\left( {n - 2} \right)}} + {h_{1}h_{2}{t\left( {n - 3} \right)}} - {h_{1}{x\left( {n - 1} \right)}} + {x(n)}}}} \\ {= {{\left( {{2h_{1}h_{2}} - h_{1}^{3}} \right){t\left( {n - 3} \right)}} + {\left( {h_{2}^{2} - {h_{2}h_{1}^{2}}} \right){t\left( {n - 4} \right)}}}} \end{matrix} & (43) \\ {\mspace{70mu}{{{+ \left( {h_{1}^{2} - h_{2}} \right)}{x\left( {n - 2} \right)}} - {h_{1}{x\left( {n - 1} \right)}} + {x(n)}}} & (44) \\ \begin{matrix} {= {{\left( {h_{1}^{4} - {3h_{1}^{2}h_{2}} + h_{2}^{2}} \right){t\left( {n - 4} \right)}} + {\left( {{h_{1}^{3}h_{2}} - {2h_{1}h_{2}^{2}}} \right){t\left( {n - 5} \right)}} +}} \\ {{\left( {{2h_{1}h_{2}} - h_{1}^{3}} \right){x\left( {n - 3} \right)}} + {\left( {h_{1}^{2} - h_{2}} \right){x\left( {n - 2} \right)}} - {h_{1}{x\left( {n - 1} \right)}} + {x(n)}} \end{matrix} & (45) \end{matrix}$ Substituting n=4k+4 and n=4k+5 into equations (44) and (45), results in the following loop update equations

$\begin{matrix} {{t\left( {{4k} + 4} \right)} = {{\left( {{2h_{1}h_{2}} - h_{1}^{3}} \right){t\left( {{4k} + 1} \right)}} + {\left( {h_{2}^{2} - {h_{2}h_{1}^{2}}} \right){t\left( {4k} \right)}} + {\left( {h_{1}^{2} - h_{2}} \right){x\left( {{4k} + 2} \right)}} - {h_{1}{x\left( {{4k} + 3} \right)}} + {x\left( {{4k} + 4} \right)}}} & (46) \\ {{t\left( {{4k} + 5} \right)} = {{\left( {h_{1}^{4} - {3h_{1}^{2}h_{2}} + h_{2}^{2}} \right){t\left( {{4k} + 1} \right)}} + {\left( {{h_{1}^{3}h_{2}} - {2h_{1}h_{2}^{2}}} \right){t\left( {4k} \right)}} + {\left( {{2h_{1}h_{2}} - h_{1}^{3}} \right){x\left( {{4k} + 2} \right)}} + {\left( {h_{1}^{2} - h_{2}} \right){x\left( {{4k} + 3} \right)}} - {h_{1}{x\left( {{4k} + 4} \right)}} + {x\left( {{4k} + 5} \right)}}} & (47) \end{matrix}$ The outputs t(4k+2) and t(4k+3) are computed incrementally as follows t(4k+2)=−h ₁ t(4k+1)−h ₂ t(4k)+x(4k+2)  (48) t(4k+3)=−h ₁ t(4k+2)−h ₂ t(4k+1)+x(4k+3)  (49)

FIG. 9 is a block diagram of an exemplary architecture of a parallel IIR filter 210, corresponding with equations (40)-(49).

FIGS. 10 and 11 are block diagrams of exemplary TH precoder architectures corresponding with the parallel IRR filter 210. TH precoder architectures 220 (FIG. 10) and 230 (FIG. 11) limit the output of the zero-forcing pre-equalizer. TH precoder architecture 230 is an equivalent form of TH precoder architecture 220.

The outputs of TH precoder architecture 220 are

$\begin{matrix} {{t\left( {{4k} + 2} \right)} = {{{- h_{1}}{t\left( {{4k} + 1} \right)}} - {h_{2}{t\left( {4k} \right)}} + {x\left( {{4k} + 2} \right)} + {v\left( {{4k} + 2} \right)}}} & (50) \\ {{t\left( {{4k} + 3} \right)} = {{{- h_{1}}{t\left( {{4k} + 2} \right)}} - {h_{2}{t\left( {{4k} + 1} \right)}} + {x\left( {{4k} + 3} \right)} + {v\left( {{4k} + 3} \right)}}} & (51) \\ {{t\left( {{4k} + 4} \right)} = {{\left( {{2h_{1}h_{2}} - h_{1}^{3}} \right){t\left( {{4k} + 1} \right)}} + {\left( {h_{2}^{2} - {h_{2}h_{1}^{2}}} \right){t\left( {4k} \right)}} + {\left( {h^{2} - h_{2}} \right){x\left( {{4k} + 2} \right)}} - {h_{1}{x\left( {{4k} + 3} \right)}} + {x\left( {{4k} + 4} \right)} + {v\left( {{4k} + 4} \right)}}} & (52) \\ {{t\left( {{4k} + 5} \right)} = {{\left( {h_{1}^{4} - {3h_{1}^{2}h_{2}} + h_{2}^{2}} \right){t\left( {{4k} + 1} \right)}} + {\left( {{h_{1}^{3}h_{2}} - {2h_{1}h_{2}^{2}}} \right){t\left( {4k} \right)}} + {\left( {{2h_{1}h_{2}} - h_{1}^{3}} \right){x\left( {{4k} + 2} \right)}} + {\left( {h_{1}^{2} - h_{2}} \right){x\left( {{4k} + 3} \right)}} - {h_{1}{x\left( {{4k} + 4} \right)}} + {x\left( {{4k} + 5} \right)} + {v\left( {{4k} + 5} \right)}}} & (53) \end{matrix}$

At the receiver side, the received signals are derived as: r(4k+2)=x(4k+2)+v(4k+2)  (54) r(4k+3)=x(4k+3)+v(4k+3)  (55) r(4k+4)=x(4k+4)+v(4k+4)+h ₁ v(4k+3)+h ₂ v(4k+2)  (56) r(4k+5)=x(4k+5)+v(4k+5)+h ₁ v(4k+4)+h ₂ v(4k+3)−h ₁ h ₂ v(4k+2)  (57)

In the presence of additive noise, the received signals become

$\begin{matrix} {{r\left( {{4k} + 2} \right)} = {{x\left( {{4k} + 2} \right)} + {v\left( {{4k} + 2} \right)} + {n\left( {{4k} + 2} \right)}}} & (58) \\ {{r\left( {{4k} + 3} \right)} = {{x\left( {{4k} + 3} \right)} + {v\left( {{4k} + 3} \right)} + {n\left( {{4k} + 3} \right)}}} & (59) \\ {{r\left( {{4k} + 4} \right)} = {{x\left( {{4k} + 4} \right)} + {v\left( {{4k} + 4} \right)} + {h_{1}{v\left( {{4k} + 3} \right)}} + {h_{2}{v\left( {{4k} + 2} \right)}} + {n\left( {{4k} + 4} \right)}}} & (60) \\ {{r\left( {{4k} + 5} \right)} = {{x\left( {{4k} + 5} \right)} + {v\left( {{4k} + 5} \right)} + {h_{1}{v\left( {{4k} + 4} \right)}} + {h_{2}{v\left( {{4k} + 3} \right)}} - {h_{1}h_{2}{v\left( {{4k} + 2} \right)}} + {n\left( {{4k} + 5} \right)}}} & (61) \end{matrix}$ respectively.

FIGS. 12 and 13 are block diagrams of exemplary receiver architectures with parallel precoders. Both receiver architecture 240 (FIG. 12) and receiver architecture 250 (FIG. 13) recover x(4k+i), i=0, 1, 2, 3 from r(4k+i), i=0, 1, 2, 3. Receiver architecture 250 is more suitable for applications where x(4k+i), i=0, 1, 2, 3 are coded.

The error probability of {circumflex over (x)}(4k+2) and

${\hat{x}\left( {{4k} + 3} \right)}\mspace{14mu}{is}\mspace{14mu}{{f\left( \frac{S_{x}}{S_{n}} \right)}.}$ The error probability of {circumflex over (x)}(4k+4) and {circumflex over (x)}(4k+5) may also be approximated by

${f\left( \frac{S_{x}}{S_{n}} \right)}.$ Hence, the performance of the parallel precoder is close to that of a straightforward TH precoder.

Various embodiments of the invention have been described. These and other embodiments are within the scope of the following claims. 

1. A precoder, comprising: a modulo device to output precoded symbols and a compensation signal having N possible values; a first pipelined finite impulse response (FIR) filter to receive an input signal and to provide an output to the modulo device; a first filter loop, including an FIR filter, a first delay element, and the modulo device, to receive the compensation signal and at least N precomputed values corresponding to the FIR filter, and to provide an output to the modulo device; and a second filter loop, including a second pipelined FIR filter, a second delay element, and the modulo device, to receive the precoded symbols and to provide an output to the modulo device.
 2. The precoder of claim 1, wherein the first filter loop includes a multiplexer to select one of the at least N pre-computed values in response to the compensation signal and to output the selected precomputed value to the modulo device.
 3. The precoder of claim 1, wherein: the first filter loop includes an inner filter loop associated with a corresponding set of one or more filter taps, and an outer filter loop associated with a corresponding set of one or more filter taps, the inner filter loop and the outer filter loop each to provide a corresponding output to the modulo device; and the inner filter loop is configured to receive the at least N precomputed values and to selectively output one of the N precomputed values in response to the compensation signal.
 4. The precoder of claim 3, wherein the inner filter loop includes the multiplexer to select the one of the N pre-computed values in response to the compensation signal and to output the selected precomputed value to the modulo device.
 5. The precoder of claim 1, wherein the precoder comprises a Tomlinson-Harashima precoder.
 6. The precoder of claim 1, wherein the second filter loop is configured as a scattered look-ahead pipelined IIR filter.
 7. The precoder of claim 1, wherein the second filter loop is configured as a clustered look-ahead pipelined IIR filter.
 8. A parallel Tomlinson-Harashima precoder, comprising: N modulo devices to perform N modulo operations in parallel and to output N precoded symbols in parallel, wherein N is a positive integer greater than one; wherein at least one of the N modulo devices is configured to receive a sum of one of a plurality of input signals and a first set of scaled versions of j previously precoded symbols, wherein j is a positive integer greater than one; and wherein at least one of the N modulo devices is configured to receive a sum of the plurality of the input signals, including a scaled version of one or more of the input signals and a second set of scaled versions of j previously precoded symbols.
 9. The parallel Tomlinson-Harashima precoder of claim 8, wherein j is greater than N, and wherein: a first one of the N modulo devices is configured to receive a sum of a first one of the input signals and the first set of scaled versions of j previously precoded symbols; and one or more remaining ones of the N modulo devices are each configured to receive a corresponding sum of the plurality of the input signals, including a scaled version of one or more of the input signals and a corresponding set of scaled versions of j previously precoded symbols.
 10. The parallel Tomlinson-Harashima precoder of claim 8, wherein j is not greater than N, and wherein: each of a first subset of the N modulo devices is configured to receive a corresponding sum of the plurality of input signals, including a scaled version of one or more of the input signals and a corresponding set of scaled versions of j previously precoded symbols; and each modulo device of a second subset of the N modulo devices is configured to receive a sum of a corresponding one of the input signals and a corresponding set of scaled versions of j previously precoded symbols.
 11. A receiver, comprising: N modulo devices to decode K precoded symbols in parallel, wherein N and K are positive integers greater than one, wherein the N modulo devices include, a first modulo device to decode a first one of the K symbols and to generate a first compensation signal, and a second modulo device to decode a second one of the K precoded symbols from a sum of the second K precoded symbol and a scaled version of the first compensation signal.
 12. The receiver of claim 11, wherein K is not greater than N.
 13. The receiver of claim 11, wherein K is greater than N, wherein the second modulo device is configured to generate a second compensation signal, and wherein the receiver is configured to decode a K^(th) one of the symbols as a sum of the K^(th) symbol and scaled versions of the first through (K−1) compensation signals. 